CODE | LAS3021 | ||||||
TITLE | An Introduction to Data Science: Storage, Visualisation and Analysis | ||||||
UM LEVEL | H - Higher Level | ||||||
MQF LEVEL | 6 | ||||||
ECTS CREDITS | 4 | ||||||
DEPARTMENT | Centre for the Liberal Arts and Sciences | ||||||
DESCRIPTION | The exponential growth of data in recent years has given the rise to the Data Analytics & Data Science field. Nowadays, data science is applied in any form of industry, in biotech to find the next generation vaccine, fin-tech to find the optimal trading models and payment systems, social networks to understand the interaction between users and so on. The data has outgrown the traditional analysis tools, and new data science techniques had to be developed to let the machine do its own analysis and forecasts. This unit introduces Data Science using a practical approach, starting from the collection of data, cleansing, storage, visualization and eventually modeling such data to put this model into production. Most techniques will be carried out using the Python language. Statistics such as correlation, hypothesis testing and visualizations will be done both through Python libraries as well in DAX in PowerBI. Some topics covered in this unit include: - Regression and Prediction - Classification, Hypothesis Testing and Deep Learning - Recommendation Systems - Predictive Modelling for Temporal Data. Unit Aims: The aim of this unit is to provide a solid introduction to Data Science concepts using a practical approach. The entire pipeline is covered starting from the collection of data, cleaning of data, storing such data, visualizing and eventually modeling such data to put such model in production. This unit will enable the students to solve a number of data-related problems. How can we intelligently recommend a film based on user interests (such as Netflix recommendation system)? What is the fair price of a 3 bedroom apartment in a specific location based on previous data and how much can the price vary? What is the optimal route an Uber taxi or a Bolt Food driver can take to reach its destination quickest and to keep up with the demand? Learning Outcomes: 1. Knowledge & Understanding By the end of the unit the student will be able to: - Execute a data analysis project; from start to finish. This includes completing practical worksheets in data processing, storage, visualization and modelling on real-world and artificial datasets; - Use descriptive statistics (mean, mode, standard deviation, distributions etc.) to summarize a given dataset and formulate a hypothesis; - Criticize/Justify a decision based on data at hand; - Build mathematical models (e.g. linear regression) describing datasets; - Use these models to predict real-world phenomena. 2. Skills By the end of the unit the student will be able to: - Program (and analyse datasets) in Python; - Use most common python plotting library tools such as Matplotlib, Plotly, Seaborn; - Use of PowerBi visualisations and DAX measures; - Use hypothesis testing to find if differences in datasets are statistically significant; - Use of jupyter notebooks/Google codelabs. Main Text/s and any supplementary readings: Main Text: - Machine Learning Mastery with Python. Understand Your Data, Create Accurate Models, and Work Projects End-to-End. Jason Brownlee Data Warehousing: - The DataWarehouse toolkit (Second Edition) Ralph Kimball and Margy Ross Visualization: - The Definitive Guide to DAX. Business Intelligence with Microsoft Excel, SQL Server Analysis Services, and Power BI. Marco Russo and Alberto Ferrari. (The best book to learn DAX) - Information is Beautiful (2012). David McCandless. Contains many infographics and visualization examples. A follow up book by the same author exists which is called Knowledge is Beautiful (2014). Statistics: - Statistical Methods for Machine Learning. Discover how to Transform Data into Knowledge with Python. Jason Brownlee. - The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2011). Trevor Hastie, Robert Tibshirani, Jerome Friedman. |
||||||
ADDITIONAL NOTES | Pre-requisite Qualifications: A strong interest in data + ability to code in Python/R |
||||||
STUDY-UNIT TYPE | Lecture and Practical | ||||||
METHOD OF ASSESSMENT |
|
||||||
LECTURER/S | Andrew Sammut |
||||||
The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints. Units not attracting a sufficient number of registrations may be withdrawn without notice. It should be noted that all the information in the description above applies to study-units available during the academic year 2024/5. It may be subject to change in subsequent years. |